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Introduction 

One of the most persistent problems facing education policymakers is the provision of 
highly effective teachers in all of our nation’s classrooms. The increasing demand for high- 
quality teachers has been well documented for at least four decades (National Commission on 
Excellence in Education, 1983; Ingersoll, 2001; Murnane & Steele, 2007). Indeed, of all school- 
level factors related to student learning and achievement, the student’s teacher has consistently 
been shown to be the most important (Goldhaber, 2002; Rockoff, 2004; Rivkin, Hanushek, & 
Kain, 2005). Though high-quality teaching influences student achievement, socio-emotional, and 
labor market outcomes (Aaronson, Barrow, & Sander, 2007; Chetty, Friedman, & Rockoff, 
2014; Goldhaber, 2002; Jackson, 2018; Kraft, 2019; Rockoff, 2004; Rivkin, Hanushek, & Kain, 
2005), teachers vary dramatically in their ability to improve student performance (Rivkin et al., 
2005; Aaronson et al., 2007). The strong causal relationship between teacher quality and student 
outcomes and the substantial variation in teachers’ effectiveness within schools and districts 
make teacher quality an ideal malleable factor for improving schooling outcomes. 

Personnel evaluation policies have been implemented in education for two purposes: to 
incentivize teachers to refine their instructional practices to improve student outcomes; and to 
identify, remediate, and if, necessary, remove the lowest-performing teachers to improve the 
quality of the teacher labor force. In 2009, the U.S. Department of Education’s Race to the Top 
(RTTT) competition encouraged states and districts to overhaul their approaches to evaluating 
teachers, setting off a national teacher evaluation reform movement to identify and dismiss low- 
performing teachers. In the years that followed, nearly all states (46 of 50 states) and largest 
school districts (22 of 25 districts) and the District of Columbia (DC) implemented teacher 


evaluation reforms (Steinberg & Donaldson, 2016). Among the major changes made to 


evaluation systems were the incorporation of high-stakes consequences for low-rated teachers, 
such as remediation plans, tenure revocation, and even termination (Steinberg & Donaldson, 
2016). 

In this paper, we examine the interaction of teacher evaluation and the job protections 
associated with a teacher’s tenure status to identify whether reforming personnel management in 
education increased the exit of low-rated teachers and, ultimately, improved the distribution of 
teacher quality. We do so in the context of Chicago Public Schools (CPS), the nation’s third- 
largest school district, and its teacher evaluation system — Recognizing Educators Advancing 
Chicago Students (REACH) — which was implemented for the first time in the 2012-13 school 
year and was still in place as of the 2020-21 school year. We address the following questions: (1) 
What is the impact of teacher evaluation reform on the exit of low-performing teachers? Does 
the impact vary by tenure status? (2) Does exiting and replacing low-rated teachers improve 
teacher quality? 

CPS is an important context to study the effects of teacher evaluation reform. Chicago’s 
REACH system incorporates high-stakes accountability sanctions, including remediation and 
dismissal, that are tied to the receipt of low evaluation ratings. These design features were 
incorporated into REACH to better differentiate teacher performance and to increase the 
accountability function of teacher evaluation. And since contractual protections are extended to 
tenured teachers who receive low evaluation ratings that are unavailable to low-rated non- 
tenured teachers, we examine the interaction between teacher evaluation reform and the 
associated employment protection granted to more experienced teachers in Chicago. Then, we 
estimate whether evaluation-induced exit of low-rated teachers improved the distribution of 


teacher quality. 


First, we show that non-tenured teachers are approximately twice as likely, on average, to 
exit CPS than their tenured colleagues with equivalent annual performance ratings, and that the 
magnitude of differential exit by tenure status is increasing with lower REACH evaluation 
ratings, suggesting that tenure provides meaningful employment protections even for the lowest- 
performing tenured teachers. Differential exit by tenure status may reflect variation in the extent 
to which non-tenured teachers, who have no more than three years of teaching experience in 
Chicago, are committed to a career in teaching. To avoid conflating a teacher’s commitment 
toward and preferences for a career in teaching with their annual evaluation ratings, we employ a 
regression discontinuity (RD) design to estimate the effect of evaluation on the exit of low- 
performing teachers. RD estimates indicate that receipt of an Unsatisfactory rating increased the 
likelihood that low-rated tenured teachers exited the district by the end of the subsequent school 
year by 50 percent; this substantive increase is driven by the involuntary exit from the district of 
low-rated tenured teachers. And though low-rated non-tenured teachers exit the district at high 
rates, we do not find that their exit is driven by receipt of low evaluation ratings. 

If evaluation successfully removes the lowest-performing teachers, it’s critical to 
understand the relative quality of the teacher labor supply available to replace them. We show that 
the performance of low-rated teachers who exited CPS is significantly worse than the performance 
of replacement teachers — those teachers who are new to a school-specific grade- or subject-level 
cluster of teachers. Leveraging the plausibly exogenous district-determined performance rating 
threshold to instrument for the exit of low-rated teachers, we find that replacing low-rated teachers 
substantively and significantly improved teacher quality, measured by both REACH evaluation 
scores and observations of a teacher’s instructional performance, with suggestive evidence of 


improvements in teachers’ value-added contributions to student (math) achievement. Policy 


simulations indicate that the quality of the available teacher labor supply is sufficient to support 
the removal of more low-performing teachers in Chicago. We estimate that if the evaluation system 
raised the threshold for an Unsatisfactory rating, the share of low-rated teachers could increase by 
as much as fivefold while still realizing improvements in the overall quality of the teacher 
workforce. 

Taken together, findings from this paper pose several policy implications. First, teacher 
evaluation has the potential to improve the quality of the teacher workforce through the 
identification and removal of low-rated teachers, and low-rated tenured teachers in particular. 
Second, absent high-stakes consequences such as those embedded in Chicago’s REACH 
evaluation system, low-performing tenured teachers would otherwise remain in the classroom, 
even though student performance would benefit from the replacement of these teachers with 
teachers of higher quality. And while just one percent of Chicago teachers annually receive low 
evaluation ratings, the quality of the available teacher labor supply could support raising the 
benchmark for low performance and thus the exit of significantly more low-performing teachers, 
enabling additional gains in teacher quality districtwide. 

Related Literature 

Recent evidence reveals the potential of teacher evaluation reforms to satisfy the 
accountability function of personnel management — the removal of low-performing teachers from 
the classroom. Evidence from Tennessee, DC, and Houston, three settings that revised their 
teacher evaluation system in the wake of RTTT, shows that evaluation reforms played a 
meaningful role in the removal of low-performing teachers (Dee & Wyckoff, 2015; Cullen, 
Koedel, & Parsons, 2021; Rodriguez, Walker, & Springer, 2020). Additional evidence from DC 


finds that the increased exit of low-performing teachers has persisted for many years after the 


initial implementation of evaluation reform (Dee, James, & Wyckoff, 2019). And even in the 
absence of high-stakes consequences (i.e., dismissal) for low-performing teachers, a pilot 
evaluation system in Chicago that provided new information to school administrators about their 
teachers’ instructional performance increased the exit of low-rated and non-tenured teachers 
from the district; however, because explicit accountability sanctions for low-performance were 
absent from Chicago’s evaluation pilot, there was no commensurate increase in the exit of low- 
rated tenured teachers (Sartain & Steinberg, 2016). Teacher evaluation reform can also shape the 
teacher labor market by influencing who enters the teaching force. Evidence suggests that the 
overall quality of novice teachers has improved nationally in the wake of evaluation reform, even 
though the supply of new teaching candidates has declined over time (Kraft, Brunner, 
Dougherty, & Schwegman, 2020). 

Teacher evaluation reform can also improve teacher and student performance by 
incentivizing teachers to refine their instructional practices. For example, Dee and Wyckoff 
(2015) find that District of Columbia Public Schools (DCPS) teachers at the margin of receiving 
disciplinary action for their low performance improved their performance evaluation scores in 
the subsequent school year. Evidence from Cincinnati shows that teachers’ contributions to 
student achievement growth increased in the wake of performance evaluation, and that these 
improvements were concentrated among teachers who were lower performing prior to the 
performance evaluation cycle (Taylor & Tyler, 2012). In Chicago, a targeted, low-stakes 
evaluation pilot that was randomly assigned across approximately 100 Chicago elementary 
schools improved student achievement, with suggestive evidence that low-performing teachers 
who exited the district were replaced by higher-performing teachers as measured by evaluator 


observations of their instructional practice (Sartain & Steinberg, 2016). In Houston, the 


introduction of teacher evaluation reform increased exit from the district of the lowest- 
performing teachers (those in the bottom quintile of the teacher performance distribution) while 
decreasing exit for teachers in the top quintile (Cullen et al., 2021). Yet, most research has 
focused on improvements to individual teachers’ performance rather than the distribution of 
teacher quality, with little evidence on whether exiting the district’s lowest-rated teachers via 
evaluation reform at scale can improve the distribution of teacher quality; we address this 
dimension of evaluation reform herein. 

While teacher evaluation reforms have emphasized greater differentiation of teacher 
performance and greater accountability for low-rated teachers, contracts negotiated between 
school districts and teachers’ unions offer job protections for tenured teachers that may limit the 
impact that evaluation reforms have on improving the quality of the teacher labor force. 

For example, tenured teachers in Chicago with low evaluation ratings are granted institutional 
supports and additional time during which their performance is re-evaluated; in contrast, non- 
tenured teachers can have their contracts non-renewed and therefore be exited from the district at 
any point during their pre-tenure years. Yet, the organizational logistics of documenting a 
teacher’s low performance, removing low-performing teachers, and the uncertainty about the 
quality of the teacher’s replacement may constrain school principals’ efforts in pursuit of this 
option. In fact, Kraft & Gilmour (2017) document that principals often avoid giving teachers low 
ratings because of the intensive amount of time required to document the low performance and to 
implement the professional development and improvement plans that low evaluation ratings 
typically trigger, especially for tenured teachers. Principals also report that they tend to avoid 
dismissing low-performing teachers due to concerns about hiring an even lower-quality 


replacement teacher from the district’s excess pool of tenured teachers (Kraft & Gilmour, 2017). 


Teacher tenure protections have recently been challenged in the courts in California, 
Minnesota, and New York. (See Kraft et al. (2020) for a review of the legal challenges to teacher 
tenure). Plaintiffs often argue that the inability to remove low-performing tenured teachers is 
unduly onerous, leaving ineffective teachers in the classroom with detrimental effects on student 
learning. Plaintiffs also cite equity concerns related to teacher tenure protections since 
disadvantaged students are more likely to be taught by a low-performing teacher; indeed, a host 
of prior evidence finds that lower-performing teachers tend to be systematically assigned to 
lower-achieving and higher-poverty schools and students (Allensworth et al., 2009; Clotfelter, 
Ladd & Vigdor, 2006; Goldhaber, Lavery & Theobald, 2015; Ingersoll, 2001; Kalogrides & 
Loeb, 2013; Kalogrides, Loeb, & Beteille, 2013; Monk, 1987). In some settings, policy changes 
have made the path to tenure more difficult or have removed tenure protections altogether. 
Evidence from New York City and Louisiana indicate that reforms to teacher tenure rules 
decreased the share of teachers who received tenure (Loeb, Miller & Wyckoff, 2015) while 
increasing the exit of less-effective teachers (Loeb et al., 2015; Strunk, Barrett & Lincove, 2017). 

This paper contributes to the literature on teacher performance evaluation in a number of 
ways. Ours is the first to examine the impact of teacher evaluation reform on the labor market 
outcomes of low-performing tenured teachers; indeed, differences in the contractual protections 
afforded to tenured and non-tenured teachers in Chicago in the wake of evaluation reform 
enables this insight (even as such contractual protections are far from unique to the Chicago 
context). Second, this paper offers new evidence on the effect of evaluation reform-induced 
teacher exit on the distribution of teacher quality, as captured by multiple teacher performance 
measures. Third, we show that the available teacher labor supply is sufficient not only to support 


increasing the benchmark for satisfactory teacher performance but also increasing the share of 


low-rated teachers who are exited from Chicago. And while evidence presented in this paper is 
consistent with the impact of evaluation reform on the exit of low-rated teachers found elsewhere 
(Dee & Wyckoff, 2015; Cullen, Koedel, & Parsons, 2016; Rodriguez, Walker, & Springer, 
2020), this paper presents novel evidence on how potential changes to the design of evaluation 
policies — specifically, the benchmark for low-performance — may result in additional 
improvements in the distribution of teacher quality. 
Teacher Evaluation in Chicago Public Schools 

The evaluation reform we study is REACH, a districtwide teacher evaluation policy 
implemented in CPS beginning in the 2012-13 school year. The development and implementation 
of REACH was in response to state legislation in Illinois that required teacher evaluations consist 
of multiple measures of teacher practice, including classroom observations based on a rubric and 
indicators of student growth.' Prior to REACH, teachers in CPS were evaluated based on a 
“checklist,” where teachers reported receiving little formal feedback on their performance and in 
some cases no formal evaluation via classroom observation of their instructional performance 
(Sartain, Stoelinga, & Brown, 2011). Chicago’s prior evaluation system did little to differentiate 
teacher performance — nearly every Chicago teacher received high evaluation ratings (Sartain et 
al., 2011; Weisberg et al., 2009). 

REACH represented a significant change in how teachers were evaluated and, ultimately, 
held accountable for their performance. School principals and assistant principals conduct formal 


classroom observations of teachers during the evaluation cycle, which are followed by a post- 


1 The Illinois Performance Evaluation Reform Act (PERA) was enacted in 2010 in part to strengthen the state’s 
Race to the Top application. CPS was an early adopter of PERA-based evaluation reform relative to other districts in 
the state. Prior to PERA, CPS had piloted an informal evaluation system using the Danielson Framework for 
Teaching to guide classroom observations and pre- and post-observation coaching conversations (Steinberg & 
Sartain, 2015). This pilot experience helped to inform some of the state legislation, particularly around the use of 
classroom observations. 


observation conference in which the observer provides timely and actionable feedback to teachers. 
Information about teacher performance is also provided through measures of student growth, 
which is less formative than information provided to teachers during classroom observations. After 
each evaluation cycle, teachers receive a ratings report from the district that contains their final 
REACH score and the associated REACH evaluation rating (see Table Al for more detail on the 
performance measures and associated weights that together contribute to the construction of a 
teacher’s summative REACH evaluation rating). The final REACH score, based on student growth 
measures and classroom observation ratings, is binned into four rating categories that comprise a 
teacher’s formal REACH evaluation rating: Unsatisfactory (100-209 REACH score points); 
Developing (210-284 REACH score points); Proficient (285-339 REACH score points); or 
Excellent (340-400 REACH score points). The timing of the evaluation cycle differs based on a 
teacher’s tenure status and prior evaluation ratings. (See Table A2 for details on the timing of the 
REACH evaluation cycle; see Chicago Public Schools (2019) teacher evaluation handbook for 
more detail on the REACH evaluation process.) 

Another important aspect of REACH is the timing of the provision of formal evaluation 
ratings (see Table A2). During the evaluation cycle, teachers receive formal feedback on their 
instructional practice via classroom observations prior to receipt of the final REACH evaluation 
rating. We refer to the ongoing feedback provided to teachers as informal information about a 
teacher’s performance because teachers do not receive their formal REACH evaluation ratings 
during this time. However, teachers likely have sufficient information to estimate their rating since 
classroom observations weigh heavily in the construction of the final REACH evaluation rating 
(see Table A1). In fact, it is not until the next school year (i.e., year ¢+/) — usually in October or 


November — that teachers receive their formal evaluation ratings. It is notable that teachers and 


school administrators do not receive the official evaluation rating until the subsequent year rather 
than at the end of the evaluation year or even during the summer, as occurs in other contexts (e.g., 
DCPS). This lag in the provision of final REACH evaluation ratings could have implications for 
student learning if low-performing teachers remain in the classroom for at least an additional year. 

REACH also codified new incentives to strengthen the system’s accountability function by 
introducing high-stakes consequences tied to the receipt of Unsatisfactory ratings that vary by a 
teacher’s tenure status. Unsatisfactory-rated tenured teachers immediately go under a remediation 
plan within 30 days of receiving the rating. This remediation plan consists of district and school 
supports to help teachers improve their practice. Unsatisfactory-rated tenured teachers also spend 
3-4 hours weekly working with a consultant teacher. After 90 school days, the Unsatisfactory- 
rated tenured teachers receive another formal REACH evaluation rating based only on classroom 
observations. If the rating improves to Proficient, the teachers are not subject to layoff; if the rating 
does not reach proficiency, the teacher may be dismissed. In contrast, non-tenured teachers with 
Unsatisfactory ratings receive no formal support via the evaluation system, and they do not make 
progress toward attaining tenure status. Further, non-tenured teachers, regardless of performance, 
can have their contracts “non-renewed” at will. Finally, unlike the IMPACT evaluation system in 
DCPS, Chicago’s REACH system does not incorporate merit-based awards for high evaluation 
ratings. See Chicago Public Schools and Chicago Teachers Union (2016) agreement for more 
details about the sanctions and supports comprising the evaluation process. 
Data and Sample 

We employ administrative data for all CPS teachers in non-charter schools from the 2012- 
13 through 2018-19 school years. Personnel data include administrative records for individual 


teachers in each school year and contain information on teacher demographics (race, gender, and 


10 


birth year), highest level of education attained, National Board certification, and a teacher’s tenure 

status.” These records also include the school where the teacher is employed, allowing us to track 

movement within and out of the district. Importantly, we also have access to information about the 
reason for a teacher’s exit from CPS. Specifically, these data indicate if a teacher’s exit was due 
to retirement, voluntary resignation, or for any “other reason.” We consider teacher exits coded as 

“other reason” as involuntary — the result of reduction-in-force layoffs, performance-related 

layoffs, or non-renewal of non-tenured teachers. In our analysis, we label these exits from CPS as 

“involuntary.” 

We examine two margins of teacher exit from CPS — any exit and involuntary exit.? And, 
given the timing of the provision of a teacher’s formal evaluation rating — teachers evaluated in 
school year ¢ do not receive their formal REACH evaluation rating until fall of school year ¢+/ — 
we consider two time points in which teachers might exit CPS following the evaluation year: exit 
in year ¢ and year t+ /. Thus, the four outcomes of interest are: 

e Any Exit (Year f). Any exit by the end of school year ¢ would occur after all evaluative 
classroom observations have been conducted but before teachers have received their formal 
REACH evaluation rating in the fall of year t+/. We interpret this type of exit from CPS as a 
response to informal information about teacher performance. 

e Involuntary Exit (Year f). Involuntary exit by the end of school year ¢t would also occur after 
all evaluative classroom observations have been conducted but before teachers have received 


their formal REACH evaluation rating in the fall of year t+/. A teacher’s involuntary exit by 


2 CPS teachers earn tenure after being employed for three consecutive years each with an evaluation rating above the 
Unsatisfactory level. In a National Council on Teacher Quality review of state tenure policies, most states award 
tenure after three years in the profession (Nitler & Gerber, 2020). 

3 We can also observe within-district transfers in the administrative data. However, we focus on exit from the district 
because we want to understand the potential of evaluation reform to improve the quality of teaching across the 
system. Transferring low-rated teachers from one CPS school to another would not result in a shift in the quality 
distribution of the teacher workforce in Chicago. 
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the end of school year ¢ could occur if the information signal received via the classroom 
observation process induces school leaders to dismiss low-performing teachers who do not 
have tenure protections. Yet, we would not expect a tenured teacher to exit CPS involuntarily 
by the end of year ¢ since they have not yet received their formal evaluation ratings. 

e Any Exit (Year ¢+/). Any exit by the end of school year ¢+/ includes exit that occurs at the 
end of school year ¢ (as described previously) and exit that occurs at the end of school year t+/ 
after teachers receive their formal REACH evaluation rating in the fall of year t+/. We 
interpret this type of exit from CPS as a response to the receipt of the formal REACH 
evaluation rating. For tenured teachers, this type of exit would occur after the completion of a 
remediation plan associated with the receipt of an Unsatisfactory evaluation rating. 

e Involuntary Exit (Year ¢+/). Involuntary exit by the end of school year ¢t+/ includes 
involuntary exit that occurs at the end of school year ¢ (as described previously) and involuntary 
exit that occurs at the end of school year ¢+/ after teachers receive their formal REACH 
evaluation rating in the fall of year t+/. We expect this type of exit from CPS to account for 
most of the exit of tenured teachers, since tenured teachers who received an Unsatisfactory 
evaluation rating in year ¢t would have completed a remediation plan prior to the end of school 
year ¢+/ that requires tenured teachers to earn a Proficient rating to avoid dismissal from CPS. 
We later show that, in the context of a regression discontinuity design, tenured teachers rated 
Unsatisfactory in year ¢ perform worse in year ¢+/ than tenured teachers rated Developing and 
whose final REACH score placed them just above the 210 Unsatisfactory/Developing 
threshold. 

The administrative data also include teacher evaluation records from the 2012-13 (the first 


year of REACH) through 2016-17 school years. Teachers evaluated during the 2016-17 school 
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year would have received their official ratings in fall 2017. The evaluation data include a teacher’s 
formal REACH evaluation rating (Unsatisfactory, Developing, Proficient, or Excellent) from 
school year ¢ and the underlying REACH score (ranging from 100-400) that determines the 
evaluation rating category. These data also include scores for each of the three components of the 
REACH evaluation system — classroom observation scores; value-added measures (VAM) in 
reading and math; and scores on a district-developed assessment called “performance tasks” (see 
Table Al). Classroom observation scores, which are on a 1-4 continuous scale, are aggregated 
across multiple components of teaching and multiple observations that occur during the evaluation 
cycle. Value-added measures are calculated annually based on student test scores on the NWEA 
achievement test, are measured in standard deviation units, and are available for teachers of reading 
and/or math in grades 3-8. Performance tasks were developed to satisfy the state requirement that 
all teachers have a student growth component to their evaluation. All teachers administer and grade 
these assessments at the beginning and end of the school year to determine student growth in the 
subject. Prior research has shown that there is little variation in teachers’ performance task scores, 
with almost all teachers scoring highly on this measure (Jiang & Sporte, 2014). 
Sample 

Our analytic sample includes all teachers in CPS with a formal REACH evaluation rating 
in any school year during the 2012-13 through 2016-17 period.* While non-tenured teachers are 
rated annually, high-rated tenured teachers — those who do not receive an Unsatisfactory or 
Developing rating — receive a formal REACH evaluation rating every other school year. We 


restrict the sample to teachers who are formally evaluated in any given year and for whom we 


4 There were 955 teacher-by-year observations where the underlying REACH score was missing but who received a 
Proficient REACH evaluation rating; we excluded these teachers from the analytic sample. 
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observe tenure status. Our analytic sample contains 44,637 teacher-by-year observations, of which 
there are 22,172 unique CPS teachers. 

Table 1 summarizes the characteristics of all CPS teachers and the analytic sample, which 
we also disaggregate by tenure status. Overall, in the analytic sample, 76 percent of teachers are 
female (compared to 77 percent of all CPS teachers); 53 percent of teachers are white (compared 
to 50 percent of all CPS teachers), 21 percent are Black (22 percent of all CPS teachers) and 19 
are Latino (20 percent of all CPS teachers). A majority of teachers in the analytic sample — 62 
percent — have a graduate degree (compared to 68 percent of all CPS teachers), and 6 percent hold 
National Board certification (compared to 8 percent of all CPS teachers). Further, 56 percent of 
the teacher-by-year observations in the analytic sample are of tenured teachers, though 74 percent 
of all CPS teachers are tenured; this is because non-tenured teachers are evaluated annually, while 
tenured teachers who receive a Proficient or Excellent REACH rating are evaluated every other 
school year. Compared to tenured teachers in the analytic sample, non-tenured teachers are more 
likely to be white, and less likely to have a graduate degree or hold National Board Certification. 
On average, 10 percent of teachers in our analytic sample annually exit CPS (compared to 12 
percent of all CPS teachers). Among tenured teachers in our sample, 7 percent annually exit CPS 
(and | percent annually exit involuntarily), while 15 percent of non-tenured teachers annually exit 
CPS (and 7 percent annually exit involuntarily). 

<Table | about here> 

Table 2 (Panel A) shows the distribution of evaluation ratings for the analytic sample, 
including the proportion receiving each of the four formal REACH evaluation ratings. Overall, 1 
percent of the evaluation ratings received were Unsatisfactory and 20 percent were Developing; 


non-tenured teachers are more likely to receive Developing ratings (28 percent) than tenured 


14 


teachers (15 percent). The fact that very few teachers in CPS received Unsatisfactory ratings is 
consistent with the distribution of teacher evaluation ratings across the country, such as in 
Michigan, where 0.5 percent of all teachers statewide were rated ineffective, the lowest of four 
ratings categories, under the state’s recently reformed evaluation system (Drake et al., 2019). Panel 
A also reports the REACH score which underlies the final REACH ratings; teachers’ mean 
(standard deviation) REACH score was 312.7 (38.8); tenured teachers’ mean REACH score was 
319.7 (37.8) compared to 303.8 (38.3) for non-tenured teachers. Recall that the ratings threshold 
below which teachers receive an Unsatisfactory final rating is 210 REACH score points, which is 
approximately 2.5 standard deviations below the average REACH score among teachers in our 
sample. In Panel B of Table 2, we report mean performance scores for teachers’ classroom 
observation and VAM (math and reading) scores. For each of the REACH performance measures, 
tenured teachers received higher scores, on average, than non-tenured teachers, which is consistent 
with prior evidence on the positive returns to teaching experience (Papay & Kraft, 2014; Steinberg 
& Yang, 2020). 
<Table 2 about here> 

Evaluation Ratings, Tenure Status and Teacher Exit 

We begin by describing the relationship between teachers’ formal REACH evaluation 
ratings, tenure status, and exit from CPS. Figure 1 shows the likelihood of teacher exit, by tenure 
status, across the distribution of teacher performance (as measured by the REACH score). Each 
panel of Figure 1 shows one of the four exit outcomes separately by tenure status, and the vertical 
lines indicate the three final REACH evaluation rating thresholds: Unsatisfactory/Developing at 
210 REACH score points; Developing/Proficient at 285 REACH score points; and 


Proficient/Excellent at 340 REACH score points. Across the four exit outcomes, non-tenured 
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teachers are always more likely to exit CPS than their tenured colleagues. Specifically, 15 percent 
of non-tenured teachers annually exit CPS by the end of year ¢, of which 7 percent exited CPS 
involuntarily; these exit rates for non-tenured teachers compare to an annual exit rate of 7 percent 
for tenured teachers, of which | percent of tenured teachers exited CPS involuntarily. 

Another key difference in the exit patterns of tenured and non-tenured teachers is the 
relationship between a teacher’s likelihood of exit and their REACH evaluation score. Notably, 
the REACH score-exit gradient is steeper among non-tenured teachers with REACH evaluation 
ratings of Unsatisfactory and Developing than among tenured teachers with the same evaluation 
rating, suggesting that, among lower-rated teachers, non-tenured teachers are more likely to exit 
CPS than tenured teachers with the same REACH rating. At the same time, we find no 
discontinuous change in the likelihood of any exit or involuntary exit by the end of school year t¢ 
at the three evaluation ratings thresholds for tenured and non-tenured teachers (Figure 1, Panels 
A-D). This is unsurprising given that, by the end of year ¢, teachers have received information 
about their instructional performance from classroom observations but have yet to receive their 
final REACH evaluation ratings. 

Yet, at the Unsatisfactory/Developing threshold, there is a discontinuous increase in any 
and involuntary exits by the end of year ¢+/ among Unsatisfactory-rated tenured teachers; we do 
not observe a similar discontinuous jump in teacher exit among Unsatisfactory-rated non-tenured 
teachers by the end of year ¢+/. This provides descriptive evidence that the labor market outcomes 
of low-performing teachers depend on both their evaluation ratings and their tenure status. 

<Figure 1 about here> 

We further explore the relationship between teacher ratings, tenure and exit by estimating 


variants of the following regression specification: 
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(1) Exitirs: = a + By Tenure, + Y3-1, Ratingi, + 0,(Tenure;, * Lp. Rating,,) + 


f (Score;,) be Xi,T + Pst ao Eirst> 


where Exit equals 1 if teacher i with evaluation rating r in school s exits CPS by the end of 
school year ¢ (or, alternatively, by the end of year t+/) and zero if teacher 7 remains employed in 
CPS. In separate regressions, Exit refers to two distinct outcomes — any exit and involuntary exit 
from CPS. We model exit as a function of a teacher’s tenure status (Tenure) and a series of 
indicator variables for a teacher’s formal REACH evaluation rating (Rating) associated with 
(though received after the end of) school year ¢; the omitted reference category is the highest 
REACH rating (i.e., Excellent). We interact the tenure variable with the vector of indicator 
variables for REACH evaluation ratings, allowing us to test for differential exit between tenured 
and non-tenured teachers who receive the same evaluation rating. We further control for a flexible 
function of Score, a teacher’s underlying REACH score, including linear and quadratic 
polynomials. X is a vector of observable teacher characteristics, including race, gender, birth year, 
education level and National Board certification. @,; is a school-by-year fixed effect that controls 
for all common shocks experienced by teachers in the same school and in the same academic year, 
thereby restricting comparisons to teachers teaching in the same school-by-year cell; and €;,-5; is a 
random error term. 

Table 3 summarizes these results; each column presents one of four exit outcomes. There 
is a strong relationship between the probability of exit — any exit and involuntary exit —and teacher 
evaluation ratings; teachers with lower ratings are more likely to exit than teachers with higher 
ratings. Tenured teachers are also significantly less likely than non-tenured teachers with the same 
REACH rating and the same REACH score to exit CPS. Notably, the magnitude of differential 


exit by tenure status is increasing with lower evaluation ratings. For example, by the end of year ¢, 
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tenured teachers rated Unsatisfactory are 27 percentage points less likely than non-tenured teachers 
rated Unsatisfactory to exit CPS, and 41 percentage points less likely to involuntarily exit CPS. 
By comparison, tenured teachers rated Proficient are 1.6 percentage points less likely than non- 
tenured teachers rated Proficient to exit CPS, and 1 percentage point less likely to involuntarily 
exit CPS. By the end of school year ¢+/ — the year in which a teacher receives the formal REACH 
evaluation rating for school year ¢ — teachers rated Unsatisfactory are significantly more likely to 
exit CPS than higher-rated teachers, and the magnitude of the coefficients associated with any exit 
in year t+/ (column 3) are nearly identical to those associated with any exit by the end of school 
year ¢ (column 1). Yet, there is no statistically significant difference in exit between tenured and 
non-tenured teachers rated Unsatisfactory by the end of year ¢+/ — either for any reason (column 
3) or involuntarily (column 4) — suggesting that the evaluation system’s formal consequences for 
low-performance compelled the exit of tenured teachers only after teachers (and their school 
administrators) received their formal REACH evaluation rating. This is in contrast to the 
significant difference in exit (any and involuntary) by the end of year ¢ between tenured and non- 
tenured teachers who are rated Unsatisfactory but who have not yet received their formal 
evaluation rating (columns | and 2). With these stylized results in place, we next turn to whether 
the provision of formal REACH evaluation ratings (coupled with the consequences for low- 
performance embedded in the REACH evaluation system) increased the exit of low-rated teachers, 
and the extent to which teacher exit varied by tenure status. 
<Table 3 about here> 

Effects of Evaluation on Teacher Exit 

We employ a regression discontinuity (RD) design to estimate the impact of evaluation, 


and, in particular, the timing of the provision of a teacher’s formal evaluation rating, on teacher 
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exit from CPS. To do so, we exploit plausibly exogenous variation in teachers’ formal REACH 
evaluation ratings induced by discrete differences in the final REACH score around the 
Unsatisfactory/Developing ratings threshold. While evaluation ratings in Chicago are based on 
four mutually exclusive ratings categories, a continuous REACH evaluation score underlies the 
assignment of these ratings. Since teachers rated Unsatisfactory and Developing just below and 
above the REACH score threshold (i.e., 210 points), respectively, should have, on average, the 
same observable and unobservable characteristics, we can consider these teachers as good as 
randomly assigned to formal REACH evaluation ratings (Table A3 presents results testing for 
discontinuities in teacher characteristics at the various evaluation rating thresholds; we find no 
consistent evidence of discontinuities in teacher characteristics, particularly at the 
Unsatisfactory/Developing threshold). We leverage this rating assignment mechanism to estimate 
the causal effects of REACH ratings (and, importantly, the corresponding dismissal threats) on 
teacher exit from CPS. Researchers elsewhere have employed a similar strategy to estimate the 
effect of evaluation reform on teacher turnover, retention, and performance (Dee & Wyckoff, 
2015). 

We focus on teachers just above/below the Unsatisfactory/Proficient threshold because this 
is where teachers face high stakes in terms of remediation and dismissal (see Tables A4 and A5 
for results associated with teachers just above/below the Developing/Proficient and 
Proficient/Excellent thresholds, respectively). We estimate impacts separately for tenured and non- 
tenured teachers since they are subject to different contractual protections associated with low 
ratings that might differentially affect teacher exit from CPS. The RD specification takes the 


following form: 
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(2) Exit;, = 51(REACH;, < 0) + f (REACH;,) + y(I(REACH;, < 0) * f(REACH;,)) + 
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where Exit equals | if teacher 7 exits CPS (at all or involuntarily) by the end of school year 
t(or, alternatively, by the end of year t+/) and zero if teacher i remains employed in CPS. REACH 
is the underlying REACH score for teacher i in school year ¢ that determines a teacher’s 
assignment to a REACH rating, which we center at the relevant threshold (210 points for the 
contrast between teachers rated Unsatisfactory or Developing; 285 points for the contrast between 
teachers rated Developing or Proficient; and 340 points for the contrast between teachers rated 
Proficient or Excellent). We include an indicator function, (REACHit< 0), that equals 1 if teacher 
iis below the centered REACH score and 0 if teacher 7 is above the centered REACH score, and 
J(REACH;), which is a smooth function of the a teacher’s centered REACH score. We further 
interact teacher i’s REACH score with the indicator function to allow the regression slope to vary 
on either side of the relevant ratings threshold. The variable / is a year fixed effect and Lj is a 
random error term. In alternative specifications of equation (2), we include X, which is a vector 
of teacher characteristics as in equation (1). 

We report parametric and nonparametric estimates of the effect of receiving a given final 
REACH evaluation rating on teacher exit from CPS. For the nonparametric estimates, we use one 
common mean square error-optimal bandwidth for each outcome separately for tenured and non- 
tenured teachers using the sharp robust RDD estimator developed by Calonico, Cattaneo, and 
Titiunik (2014). The coefficient of interest on the indicator function is 6, which captures any shift 
in teacher exit at the relevant ratings threshold. If, for example, teachers rated Unsatisfactory who 


are just below the 210 REACH score points threshold are more likely to exit CPS than teachers 


20 


rated Developing who are just above the 210 REACH score points threshold, then we would expect 
6 to be positive and significantly different from zero. 
Conditions for Causal Inference 

The key assumption underlying the internal validity of the RD design is that assignment of 
teachers to REACH ratings at the ratings threshold is as good as random (Lee & Lemieux, 2010). 
The extent to which principals or teachers can manipulate their REACH score at the margin, thus 
changing teachers’ final REACH ratings, poses a threat to this assumption. For example, principals 
may give struggling teachers the benefit of the doubt and artificially increase their classroom 
observation scores, which account for the majority of a teacher’s final REACH rating. In this way, 
teachers who should have received an Unsatisfactory rating are moved into the Developing 
category. If so, this practice would suggest that particular teachers were able to manipulate their 
formal REACH evaluation rating, calling into question the validity of the RD design. However, 
evidence from Figure 2, which shows the density of the REACH score by teacher tenure status, 
indicates that this type of systematic manipulation around the ratings threshold is unlikely to be of 
concern. Indeed, Figure 2 shows continuity of the assignment variable (i.e., REACH score) at each 
of the three formal REACH evaluation rating thresholds. Further, we find no evidence of 
statistically significant discontinuities at any of the evaluation rating thresholds, both for tenured 
and non-tenured teachers, based on results from a McCrary test (McCrary, 2008). We also provide 
evidence that the assignment of final REACH evaluation ratings for the analytic sample strictly 
complied with the rating thresholds outlined in the CPS-CTU contract (see Figure Al). That is, in 
the analytic sample, all teachers with REACH scores below 210 received an Unsatisfactory 
evaluation rating. 


<Figure 2 about here> 
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Results 

Figure 3 presents graphical evidence on the probability of teacher exit (any and involuntary 
exit) in years ¢ and f+/, by tenure status, as a function of the REACH score at the 
Unsatisfactory/Developing threshold (Figures A2 and A3 show the distribution of teacher exit at 
the Developing/Proficient and Proficient/Excellent thresholds, respectively, by tenure status). By 
the end of school year f, there is no evidence that either tenured teachers (Panels A and C) or non- 
tenured teachers (Panels B and D) rated Unsatisfactory (and just below the REACH score threshold 
of 210) exit CPS at higher rates than teachers rated Developing who are just above the 210-point 
threshold. This is unsurprising because teachers do not receive their formal REACH evaluation 
ratings until after the start of the next school year. In contrast, by the end of the next school year 
(i.e., year ¢+/), tenured teachers who are rated Unsatisfactory are much more likely to exit CPS 
than teachers rated Developing at the 210-point REACH score margin (Figure 3, Panel E). Among 
Unsatisfactory-rated tenured teachers, the likelihood of any exit in year t+ / is approximately 0.60; 
this compares to the likelihood of any exit in year ¢+/ of approximately 0.40 for Developing-rated 
tenured teachers at the margin, representing a 50 percent increase in the exit of Unsatisfactory- 
rated tenured teachers. This increase in the likelihood of any exit among Unsatisfactory-rated 
tenured teachers in year f+/ is very similar in magnitude to the increase in the likelihood of 
involuntary exit for the same tenured teachers in year t+/ (Figure 3, Panel G), suggesting that 
personnel evaluation can induce the exit of low-rated tenured teachers, but only once a teacher 
receives the binding formal evaluation rating. Thus, even when tenured teachers receive 
information about their instructional performance during the year of evaluation via ongoing 
classroom observations by school administrators, tenured teachers are unlikely to exit CPS unless 


required to do so. For non-tenured teachers, the graphical evidence suggests that the receipt of an 
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Unsatisfactory rating does not induce differential exit from CPS by the end of school year ¢+/. 
This is consistent with the fact that non-tenured teachers do not have the same employment 
protections as tenured teachers and can be exited from CPS regardless of their evaluation rating. 
<Figure 3 about here> 

Next, we generate regression-based estimates of the magnitude and statistical significance 
of the effect of an Unsatisfactory REACH rating on teacher exit from CPS. We present 
nonparametric (Table 4) and parametric (Table 5) estimates (at various bandwidths around the 
evaluation rating threshold) of the effect of receiving an Unsatisfactory rating on teacher exit in 
years ¢t and t+/, separately for tenured (Panel A) and non-tenured (Panel B) teachers.° In Table 4, 
nonparametric RD estimates indicate that there is no differential exit from CPS in year ¢ — any or 
involuntary exit — for either tenured or non-tenured teachers. These findings further suggest that 
low-rated teachers do not respond to informal information about their (low) performance by exiting 
CPS by the end of year ¢, prior to receiving their formal evaluation ratings, even though teachers 
know their performance based on classroom observations which largely determine their final 
REACH rating. 

<Table 4 about here> 

However, we find consistent and robust evidence that low-rated tenured teachers are much 
more likely to exit CPS, but only after the receipt of their official REACH evaluation ratings. 
Nonparametric estimates from Table 4 show a 17.9 percentage point increase in the likelihood of 
any exit for Unsatisfactory-rated tenured teachers in year ¢+/; this estimate is robust to the 


inclusion of controls for observable teacher characteristics (Table 4, Panel A, columns 5 and 6), 


5 While we focus on teachers at the Unsatisfactory/Developing ratings threshold because remediation and dismissal 
stakes are tied to an Unsatisfactory REACH evaluation rating, we also present nonparametric RD results for teachers 
at the Developing/Proficient and Proficient/Excellent ratings thresholds (see Tables A4 and AS, respectively). 
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and represents a 50 percent increase in the probability of exit from CPS relative to the 
counterfactual mean exit rate of 35 percent. Notably, nearly all of the increase in teacher exit is 
involuntary, as shown in columns (7) and (8) of Table 4. In Table 5, parametric RD results show 
that the nonparametric RD estimates of teacher exit in year ¢+/ are robust across multiple 
bandwidths of the REACH score for any exit (Table 5, Panel A, columns 5 and 6) and involuntary 
exit (Table 5, Panel A, columns 7 and 8). For non-tenured teachers without the contract protections 
of their tenured colleagues, we find no evidence that evaluation increased exit among the lowest- 
performing teachers by the end of year ¢+/. Taken together, these findings suggest that evaluation 
reform has played a significant role in relaxing the job protections of low-rated tenured teachers 
and increasing their exit from Chicago. Indeed, these findings indicate that, in the absence of high- 
stakes teacher evaluation with binding job dismissal stakes, low-rated tenured teachers would 
likely remain in the classroom. 
<Table 5 about here> 

Tenured teachers rated Unsatisfactory are placed on professional development and 
remediation plans that provide them with additional instructional support. Thus, it is possible that 
their performance improved and the dismissal of these teachers by the end of the school year would 
ignore any contemporaneous improvements in performance. To assess whether the performance 
of Unsatisfactory-rated tenured teachers improved in the subsequent school year, we implement 
an RD approach similar to above, but in this case the outcome is teacher performance (in year f+ /) 
on the final REACH and classroom observation scores, rather than teacher exit. We find that the 
performance of Unsatisfactory-rated tenured teachers not only didn’t improve, but declined in the 
year after evaluation (1.e., in the year in which they received their formal Unsatisfactory REACH 


evaluation rating). Compared to tenured teachers just above the 210 REACH score point threshold 
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who received a Developing rating, tenured teachers who received Unsatisfactory evaluation ratings 
were significantly lower performing (See Table 6). The marginal Unsatisfactory-rated tenured 
teacher had a REACH score at the end of year ¢+/ that was 58 points below the marginal 
Developing-rated tenured teacher, which represents approximately a 1.5-standard deviation 
decline in performance on the REACH score. We further find that the marginal Unsatisfactory- 
rated tenured teacher’s classroom observation scores at the end of year t+/ were 0.42 points lower 
at the Unsatisfactory/Developing threshold, corresponding to an approximately 1-standard 
deviation decline in the measure of instructional performance. Thus, these findings indicate that 
low-rated tenured teachers were unable to improve their performance even after receipt of their 
formal evaluation rating and contractually obligated professional development supports. 
<Table 6 about here> 

Teacher Labor Supply and the Impact of Exiting and Replacing Low-Rated Teachers 

As we have shown, the REACH evaluation system successfully increased exit from 
Chicago of the lowest-performing teachers (though, with a year lag from the year of evaluation for 
tenured teachers). Yet, if the teacher labor supply available to replace exited teachers is no more 
effective, on average, than the low-rated teachers who exited CPS, then the policy of dismissing 
low-rated teachers would not have its intended effect — improving the overall distribution of 
teacher quality in Chicago. In this section, we begin by comparing the average performance of the 
low-rated (i.e., Unsatisfactory-rated) teachers who exited CPS to the performance of those teachers 
who replaced them (1.e., Replacement teachers). Notably, we do not observe each 
exited/replacement pair of teachers; for example, if a low-rated 5"*grade teacher exited a CPS 
school, we do not observe the specific teacher who replaced the low-rated teacher in the same 5"- 


grade classroom in the next school year. Yet, given the richness of our teacher-level data, we are 
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able to place all teachers in a unique school-specific cluster; we define clusters at either the 
school*grade or school*subject level. Doing so allows us to locate and compare the performance 
of all teachers in a unique school-specific cluster that contained at least one Unsatisfactory-rated 
teacher. In the 5"-grade example, that means we can observe the performance of all 5-grade 
teachers in the school in the next year. Across the 417 clusters in our data that contain at least one 
Unsatisfactory-rated teacher, the average cluster contains 4.4 (s.d. = 4.23) teachers (see Figure A4 
for the distribution of cluster size). 

Table 7 shows the performance scores of Unsatisfactory-rated teachers (in the year in 
which they received an Unsatisfactory rating) and replacement teachers (in their first year in a 
school*grade or school*subject cluster containing at least one Unsatisfactory-rated teacher). For 
replacement teachers, we disaggregate performance scores for those who are in their first year in 
a CPS school (New to CPS), those who moved from another school within CPS (From another 
CPS school), and those who changed assignments within the same CPS school (From same 
school). On average, replacement teachers are much higher performing than Unsatisfactory-rated 
teachers across multiple teacher performance measures, including the REACH score and the two 
primary components of the REACH score — classroom observations and VAM. Replacement 
teachers score 304.6 points, on average, on the REACH score compared to 188.4 REACH score 
points for Unsatisfactory-rated teachers; this difference corresponds to approximately 2.7 standard 
deviations of the REACH score. And, while New to CPS replacement teachers are slightly lower 
performing than replacement teachers who moved within CPS (or within the same school) to a 
school-specific cluster containing a low-rated teacher, the performance of New to CPS replacement 


teachers is significantly better than Unsatisfactory-rated teachers, including those who exit by the 
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end of school year ¢ and those who remain in CPS in the year after they receive an Unsatisfactory 
rating (i.e., year t+/). 
<Table 7 about here> 

These descriptive patterns indicate that the average performance of replacement teachers 
is significantly better than the average performance of Unsatisfactory-rated teachers. Yet, to what 
extent does CPS’s policy of dismissing low-rated teachers lead to marginal improvements in 
teacher quality in Chicago? To answer this policy-relevant question, we leverage the plausibly 
exogenous district-determined performance rating threshold in an instrumental variables (IV) 
framework to estimate the impact of exiting and replacing Unsatisfactory-rated teachers on teacher 
quality within school-specific clusters. We specify the IV approach in the following two-stage 


least squares (2SLS) system of equations: 


(3) ExitCPS;-¢ = BICREACH;,; < 210) + yTeacherQuality,, + Xip0 + Ag+ Uict: 


In equation (3), the first-stage outcome is ExitCPS;,;, which indicates whether teacher i 
located in school-specific cluster c exited CPS by the end of school year t. The exogenous 
instrument is the district-determined performance rating threshold defined by the indicator 
function, (REA CHict< 210), which equals 1 if teacher 7 in cluster c receives a final REACH score 
in school year ¢ that is below 210, the threshold below which teachers receive an Unsatisfactory 
final REACH rating (and 0 if teacher i is at or above 210 points on the final REACH score). The 
variable TeacherQuality is the mean performance — REACH score, classroom observation score, 
or VAM - of all teachers located in school-specific cluster c during school year ¢; X is a vector of 
teacher-level characteristics, including gender, race/ethnicity, educational attainment, National 
Board Certification, birth year, and tenure status. The variables 4 and u represent year fixed effects 


and a random error term, respectively. In alternative versions of this first-stage equation, we 
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replace ExitCPS;-¢ with ExitCPSj¢¢41, which indicates whether teacher 7 located in school- 
specific cluster c exited CPS by the end of school year ¢+/ (i.e., one full school year after receipt 
of final REACH rating in school year ¢). We cluster standard errors at the school level. 


The second-stage equation is specified as: 


(4) TeacherQuality¢ 141 = 5(ExitCPS ict ) + tTeacherQuality,, + Xj,-P + Ag+ Ect: 


In equation (4), the second-stage outcome is the mean performance — REACH score, 
classroom observation score, or VAM -— of all teachers located in school-specific cluster c during 
school year t+/ (the school year after at least one teacher in cluster c received an Unsatisfactory 
rating for school year ¢). The inclusion of lagged teacher quality (TeacherQuality) on the right- 
hand side allows us to estimate the marginal effect of policy-induced exit from CPS of a low-rated 
teacher from school-specific cluster c on teacher quality in the same cluster, which is captured by 
the parameter estimate 5. As previously discussed, the design of the CPS dismissal policy results 
in a lag in the provision of final evaluation ratings (1.e., REACH ratings for year ¢ are provided to 
schools and teachers in the fall of school year ¢+/); thus, in alternative specifications of the second- 
stage equation we replace the outcome TeacherQuality,14, with TeacherQuality,142 to 
further examine the implications of the timing of evaluation ratings and the exit of low-rated 
teachers on teacher quality. All other variables are defined as in equation (3). 

Table 8 presents the IV results. We find consistent and robust evidence that exiting low- 
rated teachers improves teacher quality within school-specific grade- or subject-level clusters in 
school years subsequent to the exit of a low-rated teacher from the same cluster. For the REACH 
score, average teacher quality improves between year ¢ and year ¢+/ by 48.6 points, approximately 
1.2 standard deviations of the teacher-level REACH score, with the exit of a low-rated teacher by 
the end of school year t (see Table 8, Panel A). Notably, the estimated impact of exiting a low- 
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rated teacher by the end of school year ¢ is nearly identical — 49.3 REACH score points — when we 
extend the time period one additional year (see Table 8, Panel B); this suggests that low-rated 
teachers are replaced by much higher quality replacement teachers within the same grade or subject 
cluster, and these positive impacts on teacher quality persist for at least another school year in the 


absence of the low-rated teacher. 


Further, we find that the pattern of results describing the impact of exiting low-rated 
teachers on the summative REACH score are qualitatively the same when teacher quality is 
measured by classroom observation scores; this result is unsurprising since the vast majority of a 
teacher’s final REACH score is based on observations of a teacher’s instructional performance in 
the classroom (see Table Al). Finally, we find suggestive evidence that the exit of low-rated 
teachers improves teacher quality based on the cluster-specific mean value-added contribution to 
student math achievement (math VAM) by year t+2. We do not find any evidence that exiting 


low-rated teachers improves mean reading VAM in the cluster. 


<Table 8 about here> 

Policy Simulation: Changing the Performance Standard for Unsatisfactory Teaching 

Thus far, we have established that replacement teachers are considerably higher performing 
than Unsatisfactory-rated teachers located in the same grade-by-school or subject-by-school 
settings, and that the policy-induced exit of low-rated teachers significantly improves cluster- 
specific teacher quality. However, the extent to which the evaluation policy can shift the overall 
distribution of teacher quality in Chicago is limited by the fact that, under the current REACH 
system, just 1 percent of CPS teachers are annually rated Unsatisfactory. In this section, we 
consider the implications of changing the threshold for an Unsatisfactory rating in ways that result 


in a greater share of CPS teachers identified as low performing. 
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We first examine the distribution of teachers’ REACH evaluation ratings and performance 
measures (classroom observation and VAM scores) below three different thresholds for low- 
performance: 210 points (the current Unsatisfactory rating threshold); 230 points; and 250 points. 
Table 9 summarizes these results (we focus our discussion on tenured teachers; results for non- 
tenured teachers are qualitatively similar and are also presented in Table 9). Under the current 
Unsatisfactory threshold (i.e., 210 points), 239 tenured teachers received Unsatisfactory ratings 
during the 2012-13 through 2016-17 study period. Of the 417 tenured teachers below the 230- 
point threshold, 56 percent received an Unsatisfactory rating under the current policy (1.e., 210- 
point threshold) while 44 percent received a Developing rating. Of the 900 tenured teachers below 
the 250-point threshold, 26 percent received an Unsatisfactory rating under the current policy 
while 74 percent received a Developing rating. As expected, teachers’ performance scores are 
monotonically increasing as the performancne threshold increases, and the share of all CPS tenured 
teachers below the threshold would increase from 1 percent (at 210 points) to 2 percent (at 230 
points) to 4 percent (at 250 points). Moreover, even at a threshold of 250 points, the performance 
scores of tenured teachers are well below the districtwide mean. Specifically, the mean REACH 
score among tenured teachers below the 250-point threshold is 222.5 points; this is approximately 
2 standard deviations lower than the districtwide mean of 312.7 REACH score points (see Table 
2). We similarly find significant differences in the classroom observation and VAM scores of 


tenured teachers below a 250-point threshold compared to the CPS districtwide mean. 
<Table 9 about here> 


Lastly, we examine whether and to what extent the available teacher labor supply is higher 
performing than the low-performing teachers identified at each of the three thresholds. We focus 


on the performance of new-to-CPS teachers because these teachers best represent the relative 
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quality of the teacher labor supply available to replace low-rated teachers, exclusive of the current 
stock of CPS teachers. Figure 4 shows the distribution of the difference in performance scores 
between new-to-CPS replacements and teachers who scored below the three thresholds. We restrict 
the performance comparison of replacement and low-performing teachers to within the same 
school settings. In Figure 4, a value of 0 indicates that the average replacement teacher has the 
same performance as the low-performing teacher in the same school; positive values indicate the 
replacement teachers are higher performing, while negative values indicate the replacement 
teachers are lower performing. Panel A shows the distribution of the difference in final REACH 
scores between replacement teachers and low-performing teachers. At the current threshold of 210, 
there is only 1 instance where the low-performing teacher received a higher score than new-to- 
CPS teachers in the same school. Even at the 250-point threshold, 96 percent of the distribution is 
to the right of 0. This pattern by which the overwhelming share of new-to-CPS teachers is higher- 
performing than low-rated teachers holds across the different performance measures, and, 
ultimately, reveals that the quality of the available teacher labor supply is sufficient to 
accommodate raising the standards for teacher performance in Chicago and thus increasing the 


threshold for an Unsatisfactory performance rating. 


<Figure 4 about here> 
Conclusion 
In this paper, we examined the impact of teacher evaluation reform on the exit of low- 
performing teachers from Chicago Public Schools, with particular interest on the potentially 
differential effect by a teacher’s tenure status and the consequences for changes in teacher 
quality. Indeed, tenured teachers in Chicago benefit from contractual protections unavailable to 


their non-tenured colleagues. Tenured teachers who receive Unsatisfactory ratings under the 
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REACH evaluation system are provided intensive professional development and support and are 
afforded an opportunity to demonstrate instructional improvement prior to facing dismissal. In 
contrast, non-tenured teachers can have their contracts terminated at any time. Though we find 
that tenured teachers are significantly less likely, on average, to exit Chicago than equivalently 
rated non-tenured teachers, regression discontinuity estimates indicate that receipt of an 
Unsatisfactory evaluation rating increased the exit of tenured teachers from Chicago by 50 
percent, and this increase is driven by their involuntary exit from the district. And while low- 
rated non-tenured teachers exit Chicago at high rates, we find no evidence that the evaluation 
system itself induced that exit; this is likely due to the fact that principals always have the ability 
to exit a non-tenured teacher from the school at any time. 

Our findings reveal that the timing of the provision of a teacher’s evaluation rating is 
consequential for determining when low-rated tenured teachers exit the district. Under Chicago’s 
REACH system, teachers and their school administrators do not receive final teacher evaluation 
ratings until well into the fall of the subsequent school year. This contrasts with teacher 
evaluation systems in other urban districts, such as DC Public Schools, where teachers receive 
their final evaluation ratings prior to the start of the next school year (Dee & Wyckoff, 2015). 
One consequence of this evaluation system feature is that low-rated teachers, particularly tenured 
teachers, remain in the classroom for at least an additional year. At the same time, this feature 
also affords us the opportunity to examine whether teachers who will receive Unsatisfactory 
ratings voluntarily exit the classroom in response to informal information about their 
performance received during the annual evaluation process (i.e., from observations of their 
classroom instruction) or delay exit for a year and exit involuntarily only after receiving their 


official REACH evaluation rating. We find that the increase in exit of low-rated tenured teachers 
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occurs only after the receipt of the official rating, and is driven almost entirely by their 
involuntary exit (i.e., dismissal) from the district. Thus, the exit of low-rated tenured teachers 
hinges on the receipt of their formal Unsatisfactory rating, and the consequence of the delay in 
the receipt of evaluation ratings is that low-performing teachers remained in the classroom for at 
least an additional school year. And, by retaining low-performing tenured teachers for an 
additional school year, student achievement likely suffered. Indeed, instrumental variables 
estimates show that exiting and replacing low-performing teachers significantly increased 
teacher quality within school-specific clusters across multiple dimensions of teacher 
performance, including final evaluation scores, instructional performance and value-added 
contribution to student math achievement. 

Since an overwhelming share of a teacher’s REACH rating depends on classroom 
observation scores, it’s notable that there are documented concerns related to this particular 
measure of teacher quality. Recent evidence has shown that the characteristics of a teacher’s 
students influences the measurement of teacher quality via classroom observations ratings 
(Campbell & Ronfeldt, 2018; Gill et al., 2016; Steinberg & Garrett, 2016; Whitehurst et al., 
2014). For example, teachers who were randomly assigned lower-achieving students received 
lower classroom observations scores (Steinberg & Garrett, 2016). In Chicago, the context that we 
study, Black teachers receive lower classroom observation scores, on average, than their non- 
Black peers (Jiang & Sporte, 2016; Steinberg & Sartain, 2021); these race-specific differences 
are largely attributable to differences in the school contexts in which Black and non-Black 
teachers typically teach, and not to differences in teacher quality as measured by a teacher’s 
value-added contribution to student achievement growth (Steinberg & Sartain, 2021). These 


race-specific differences in teacher ratings have also been recently documented elsewhere 


ao 


(Bailey et al., 2016; Drake et al., 2019; Jones et al., 2021; Vaznis, 2013a, 2013b). These 
observed patterns in teacher ratings could lead to unintended consequences, such as the 
disproportionate exit of teachers of color. Thus, school districts must ensure that evaluation 
systems accurately measure teacher quality. Indeed, policy-specific efforts to evaluate teacher 
performance more fairly and equitably is necessary to avoid the possibility that personnel 
evaluation adversely and differentially affects teachers in different school settings. 

Another potential concern with high-stakes evaluations is that low-performing teachers 
may not be provided sufficient opportunity to improve their instructional performance. As a 
result of the time lag in providing final REACH evaluation ratings to teachers, we are able to 
examine whether low-rated teachers improve when provided instructional supports (and the time 
to respond to such supports). We find that the instructional performance of Unsatisfactory-rated 
tenured teachers who remain in Chicago for an additional year not only doesn’t improve, but 
declines compared to teachers just above the district-determined threshold for an Unsatisfactory 
rating. Moreover, school administrators may avoid dismissing low-performing teachers due to 
concerns about whether the available teacher supply is of sufficient quality to replace exited 
teachers (Kraft & Gilmour, 2017). Evidence herein indicates this concern is unfounded, at least 
in the case of Chicago. We find that the instructional quality of the available teacher labor supply 
in Chicago is sufficient not only to support replacing existing low-rated teachers, but also to 
expand the share of teachers receiving Unsatisfactory ratings and who are therefore subject to 
dismissal. 

As reforms to teacher evaluation systems have rolled out across the country, few teachers 
are annually identified for instructional improvement or removal from the classroom due to low 


performance. A systematic review of states that have recently implemented teacher evaluation 
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reforms finds that less than one percent of teachers have been identified as low-performing 
(Kraft & Gilmour, 2017); thus, the identification of low-performing teachers has changed little in 
the decade since the national movement to reform teacher evaluation began. In fact, one of the 
barriers to improving the quality of the teacher workforce via personnel management is the 
continued lack of identification of low-performing teachers whose practices are detrimental to 
student learning. This fact is consistent in Chicago where just 1 percent of teachers annually are 
identified as Unsatisfactory. Evidence from this paper shows that while the potential for 
evaluation systems to shift the distribution of teacher quality has yet to be fully realized, changes 
to existing evaluation policies can accomplish this by increasing the performance standard for 
unsatisfactory teaching. 

Ultimately, our findings reveal the important role that the design of evaluation systems 
play in determining both who is deemed low-performing and when low-performing teachers are 
subject to dismissal. Thus, education leaders and policymakers in districts like Chicago and 
elsewhere should consider refining two important design features of teacher evaluation systems — 
the standard for low-performance and the timing of evaluation ratings. In doing so, systems of 
evaluation may successfully satisfy their two primary objectives — improving teacher quality and 


student achievement. 
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Tables & Figures 
Table 1. Teacher Characteristics 


Analytic Sample 

All CPS Tenured Non-tenured 

Teachers All Teachers Teachers Teachers 
Female 0.77 0.76 0.77 0.74 
Black 0.22 0.21 0.25 0.16 
Latino 0.20 0.19 0.21 0.17 
White 0.50 0.53 0.47 0.60 
Asian/Other 0.08 0.07 0.07 0.08 
Graduate Degree 0.68 0.62 0.73 0.49 
National Board Certified 0.08 0.06 0.10 0.01 
Tenured 0.74 0.56 1.00 0.00 
bi Yen a a see 
Exit CPS 0.12 0.10 0.07 0.15 
Involuntary Exit from CPS 0.04 0.04 0.01 0.07 
Teachers 26,730 D2, ie 14,824 9,905 
Teacher* Year Observations 96,491 44.637 24,968 19,669 


Notes. Each cell reports proportion, except birth year which reports mean (standard deviation). Data are for the 2012- 
13 through 2016-17 school years. Data include Chicago Public School teachers present in any year during the study 
period (charter and alternative school teachers are excluded). Graduate Degree includes teachers with a master’s or 
doctorate degree. 
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Table 2. Teacher Evaluation Ratings, by Tenure Status 


Non-tenured 


All Teachers Tenured Teachers Teachers 
Panel A: Final Ratings 
Unsatisfactory 0.01 0.01 0.01 
Developing 0.20 0.15 0.28 
Proficient 0.53 0.52 0.54 
Excellent 0.25 0.32 0.18 
312.7 319.7 303.8 
RE Sere (38.8) (37.8) (38.3) 
Panel B: Performance Measures 
: 3.14 3.22 3.03 
Classroom Observation (0.45) (0.44) (0.44) 
0.02 0.05 -0.01 
vee (0.89) (0.87) (0.92) 
: 0.02 0.06 -0.04 
VAM ending) (0.79) (0.75) (0.83) 
Teachers 22,172 14,824 9,905 
Teacher* Year Observations 44,637 24,968 19,669 


Notes. Each cell reports mean (standard deviation), except Final Ratings categories, which report proportions. REACH 
Score is the teacher’s summative evaluation score based on multiple performance measures (see Table Al) upon which 
a teacher’s formal REACH evaluation rating is based and is on a 100-400 continuous point scale. Teachers whose 
REACH Score is below 210 receive an Unsatisfactory rating; teachers whose REACH Score is 210-284 receive a 
Developing rating; teachers whose REACH Score is 284-339 receive a Proficient rating; and teachers whose REACH 
Score is greater than 339 receive an Excellent rating. A teacher’s Classroom Observation score is based on multiple 
classroom observations of a teacher’s instruction, and is measured on a 1-4 integer scale. A teacher’s VAM score is 
based on a teacher’s contribution to student achievement growth (in math or reading) and is standardized at the 


teacher* year level. 
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Table 3. Association between Teacher Tenure, Evaluation Ratings and Exit from CPS 


Tenured 


Unsatisfactory 


Developing 


Proficient 


Tenured* Unsatisfactory 


Tenured* Developing 


Tenured* Proficient 


P-value from F-test: 
Unsatisfactory= 
Developing=Proficient 
P-value from F-test: 


Tenure*Unsatisfactory= 
Tenure*Developing= 
Tenure*Proficient 


Adjusted R? 


Teacher* Year Observations 


Any Exit 
(Year t) 


(1) 


-.030**# 
(.006) 


143%* 
(.040) 


035%# 
(.012) 


010 
(.008) 


-2678*# 
(.043) 


-.090**# 
(.010) 


-.016** 
(.007) 


.000 


.000 


0.095 
44,637 


Involuntary 
Exit (Year t) 


(2) 


003 
(.003) 


210*** 
(.036) 


O52*## 
(.007) 


004 
(.004) 


-A10*** 
(.037) 


-1 18% 
(.006) 


-.010*** 
(.003) 


.000 


.000 


0.181 
44,637 


Any Exit 


Involuntary 


(Year t+1) Exit (Year t+1) 


(3) 


-.067**# 
(.008) 


134%* 
(.042) 


054% 
(.015) 


008 
(.010) 


-.071 
(.044) 


-.067**# 
(.013) 


-.020** 
(.009) 


136 


.000 


0.097 
44,637 


(4) 


-.002 
(.004) 


049 
(.038) 


026*** 
(.008) 


005 
(.005) 


043 
(.046) 


-.030**# 
(.007) 


-.010** 
(.004) 


025 


.004 


0.049 
28,214 


Notes. Coefficients reported with robust standard errors (clustered at the school level). All regressions include school- 
by-year fixed effects, linear and quadratic polynomials in the REACH Score, and the following teacher characteristics: 
race, gender, birth year, education level and National Board certification. Any Exit (Year t) includes teachers who 
exited Chicago Public Schools (CPS) by the end of the current school year; Any Exit (Year t+ 1) includes teachers who 
exited CPS by the end of the subsequent school year. Jnvoluntary Exit includes teachers who exited CPS for reasons 
other than retirement or resignation. The omitted reference category includes teachers who were rated Excellent in a 
given school year. The sample size changes in Column 4 because we have access to involuntary exit data through 
2017-18, while the other personnel data is available through 2018-19. Coefficients are statistically significant at the 


*10%, **5% and ***1% levels. 
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Table 4. Nonparametric RD Estimates of the Impact of Unsatisfactory Evaluation Rating on Teacher Exit, by Tenure Status 


Any Exit Involuntary Exit Any Exit Involuntary Exit 
(Year t) (Year t) (Year t+1) (Year t+1) 
(1) (2) (3) (4) (5) (6) (7) (8) 

Panel A: Tenured 
Unsatisfactory (relative to -.078 -.080 -.046 -.058 179% 179% .153** 137** 
Developing) (.089) (.094) (.072) (.072) (.099) (101) (.070) (.067) 
Counterfactual Mean 0.22 0.22 0.08 0.08 0.35 0.36 0.05 0.05 
Bandwidth 28.1 25.3 21.5 20.8 29.9 28.0 25.0 25.0 
N (left) 179 164 143 137 187 179 102 102 
N (right) 409 339 243 07 451 407 223 223 
Panel B: Non-tenured 
Unsatisfactory (relative to -.021 -.008 -.017 -.008 -.022 -.010 .041 .078 
Developing) (.094) (.094) (.109) (110) (.079) (.081) (.074) (.095) 
Counterfactual Mean 0.43 0.43 0.38 0.38 0.53 0.54 0.10 0.11 
Bandwidth 18.6 18.6 14.3 14.0 24.2 22.8 25.0 25.0 
N (left) 168 169 138 135 200 194 91 91 
N (right) 420 421 299 285 627 585 376 376 
Year FE x x x x x x x x 
Teacher Xs x x x x 


Notes. Each column (within a panel) is a separate regression. Coefficients from nonparametric regression discontinuity (RD) reported with robust standard errors 
(clustered at the school level). All regressions include controls for the linear running variable — a teacher’s final REACH score (from year ¢). Teacher Xs include 
controls for teacher gender, race/ethnicity, educational attainment, National Board Certification, and birth year. Coefficients are statistically significant at the *10%, 
**5% and ***1% levels. 
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Table 5. Parametric RD Estimates of the Impact of Unsatisfactory Evaluation Rating on Teacher Exit, by Tenure Status 


Any Exit Involuntary Exit Any Exit Involuntary Exit 
(Year t) (Year t) (Year t+1) (Year t+1) 
(1) (2) (3) (4) (5) (6) (7) (8) 

Panel A: Tenured 
Unsatisfactory (relative to -.090 -.109 -.087 -.108 140 130 .199** -185* 
Developing) (.102) (107) (.070) (.073) (109) (110) (.094) (.096) 
(BW=20) [350] [350] [350] [350] [350] [350] [232] [232] 
Unsatisfactory (relative to -.104 -.129 -.058 -.070 .175* 154* .173* 162% 
Developing) (.084) (.083) (.063) (.063) (.096) (.092) (.080) (.083) 
(BW=25) [495] [495] [495] [495] [495] [495] [325] [325] 
Unsatisfactory (relative to -.097 -117 -.052 -.060 175** 153* 1L85** 171** 
Developing) (.074) (.074) (.054) (.054) (.084) (.081) (.073) (.075) 
(BW=30) [651] [651] [651] [651] [651] [651] [426] [426] 
Panel B: Non-tenured 

-.079 -.075 -.121 -.114 -.060 -.059 .025 023 
Unsatisfactory (BW=20) (.079) (.079) (.078) (.078) (.085) (.086) (.085) (.085) 

[656] [656] [656] [656] [656] [656] [344] [344] 

-025 -.018 -.064 -.058 -017 -.012 035 036 
Unsatisfactory (BW=25) (.074) (.072) (.073) (.073) (.075) (.076) (.077) (.077) 

[867] [867] [867] [867] [867] [867] [468] [468] 

-.017 -.006 -.037 -.025 -.009 001 071 081 
Unsatisfactory (BW=30) (.068) (.067) (.070) (.070) (.070) (.070) (.069) (.069) 

[1083] [1083] [1083] [1083] [1083] [1083] [600] [600] 
Year FE xX xX xX xX xX xX xX xX 
Teacher Xs xX xX xX xX 


Notes. Each cell (within a column and panel) is a separate regression Coefficients from parametric regression discontinuity (RD) reported with robust standard 
errors (clustered at the school level) in parentheses and sample size in brackets. The sample size in columns 7 and 8 differ from columns 1-6 (within a panel) 
because we have access to involuntary exit data through 2017-18, while the other personnel data is available through 2018-19. All regressions include controls for 
the linear running variable — a teacher’s final REACH score (from year f) centered around the 210 threshold for Unsatisfactory/Developing — and the centered 
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running variable interacted with the Unsatisfactory indicator. Teacher Xs include controls for teacher gender, race/ethnicity, educational attainment, National Board 
Certification, and birth year. Coefficients are statistically significant at the *10%, **5% and ***1% levels. 
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Table 6. Nonparametric RD Estimates of the Impact of Unsatisfactory Evaluation Rating 
on Subsequent REACH Score, by Tenure Status 


REACH Score Classroom Observation Score 
(Year t+1) (Year t+1) 

(1) (2) (3) (4) 
Panel A: Tenured 
Unsatisfactory (relative to -61.4** 57 SE -0.46* -0.42** 
Developing) (27.4) (20.5) (0.24) (0.21) 
Counterfactual Mean 257.0 253.7 2.25 2.60 
Bandwidth 7.9 8.9 8.6 9.4 
N (left) 37 40 4] 42 
N (right) 33 38 46 52 
Panel B: Non-tenured 
Unsatisfactory (relative to -9.2 -11.8 -0.24 -0.13 
Developing) (16.5) (18.3) (0.19) (0.21) 
Counterfactual Mean 271.6 272.4 2:5 2.59 
Bandwidth 14.3 11.7 11.6 11.0 
N (left) 44 40 51 47 
N (right) 127 99 115 105 
Year FE xX xX xX xX 
Teacher Xs xX xX 


Notes. Each column (within a panel) is a separate regression. Coefficients from nonparametric regression discontinuity 
(RD) reported with robust standard errors (clustered at the school level). All regressions include controls for the linear 
running variable — a teacher’s final REACH score (from year ¢). Teacher Xs include controls for teacher gender, 


race/ethnicity, educational attainment, National Board Certification, and birth year. Coefficients are statistically 
significant at the *10%, **5% and ***1% levels. 
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Table 7. Performance Measures for Replacement and Unsatisfactory-Rated Teachers 


Replacement Teachers Unsatisfactory-Rated Teachers 
From From 
another same Remain 
New to CPS CPS Exit (Year 
All CPS School School All (Year t) t+1) 
Panel A: Final Ratings 
Unsatisfactory 0.03 0.04 0.04 0.02 1.00 1.00 1.00 
Developing 0.32 0.53 0.42 0.24 0.00 0.00 0.00 
Proficient 0.51 0.39 0.45 0.55 0.00 0.00 0.00 
Excellent 0.14 0.05 0.09 0.18 0.00 0.00 0.00 
304.6 275.6 289.7 311.3 188.4 185.7 191.0 


REACH Score (40.8) (38.5) (43.0) (38.3) (19.3) (20.8) (17.4) 


Panel B: Performance Measures 


Glasstobin 2.99 2.65 2.82 3.08 1.80 1.75 1.84 

Obsctiation (0.50) (049) (50) (47) (028) (0.29) (0.26) 
0.14 0.09 -0.40 0.15 1.05 115 0.98 

veNiMa) (112) (1.10) (1.00) (1.14) ~=— (1.02) ~—— 0.97) (1.06) 
. 0.10 0.19 0.03 0.10 0.92 0.81 — -1.00 

VAM (Reading) ot) .17)——«01).—s—=«~A0.98)—s«04)— ss A)—Ss«.98) 
Teachers L717 224 182 1311 537 263 274 


Notes. Each cell reports mean (standard deviation), except Final Ratings categories, which report proportions. REACH 
Score is the teacher’s summative evaluation score based on multiple performance measures (see Table Al) upon which 
a teacher’s formal REACH evaluation rating is based and is on a 100-400 continuous point scale. A teacher’s 
Classroom Observation score is based on multiple classroom observations of a teacher’s instruction, and is measured 
on a 1-4 integer scale. A teacher’s VAM score is based on a teacher’s contribution to student achievement growth (in 
math or reading). Replacement teachers are defined as teachers who are new to a grade-by-school or subject-by-school 
cluster that included at least one teacher who received an Unsatisfactory rating. 
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Table 8. Instrumental Variables (IV) Estimates of the Impact of Exiting Unsatisfactory- 
Rated Teachers on Changes in Teacher Quality 


REACH Classroom VAM VAM 
Score Observation (Math) (Reading) 


Panel A: Change in Teacher Quality (Year t to Year t+1) 


48.62" 0.48% 0.02 0.40 
Exit CPS; (4.80) (0.06) (0.24) (0.24) 
First Stage: 
0.23%** 0.23%** 0.26%** 0.26% 
UR EACH 210) (0.01) (0.01) (0.02) (0.02) 
IV (F-Statistic) 312.43** 326.89" 136.53** 94.59% 
2 
Teacher 1c 64,535 73,357 18,447 21,075 
Observations 


Panel B: Change in Teacher Quality (Year t to Year t+2) 


49.32 0.51 0.45* -0.10 
Exit CPS; (5.83) (0.07) (0.26) (0.23) 
First Stage: 
0.23% 0.23% 0.29% 0.26*** 
WREACH<210) (0.02) (0.01) (0.03) (0.03) 
IV (F-Statistic) 238.74 245.42*** 116.18*** 99.45*#* 
2k 
Heather Veet 45,349 53,537 12,895 14,936 
Observations 
Panel C: Change in Teacher Quality (Year t to Year t+2) 

38.61*** 0.40% 0.34* 0.07 
Exit CPSi1 (4.57) (0.05) (0.20) (0.16) 
First Stage: 

0.30%** 0.30*** 0.38%** 0.37" 
IREACH S210) (0.02) (0.02) (0.03) (0.03) 
IV (F-Statistic) 251.70*** 259.04%** 122.53*#* 126.51 
2k 
one 45,349 53,537 12,895 14,936 
Observations 


Notes. Each column within a panel reports a separate 2SLS instrumental variables regression. Teacher quality 
outcomes (REACH Score, Classroom Observation, VAM) measured at the cluster-level (i.e., school*year* grade or 
school*year*subject cells). In Panels A and B, coefficient on Exit CPS; reports the impact of exiting an Unsatisfactory- 
rated teacher from CPS by the end of year ¢ (the year in which a teacher received an Unsatisfactory rating) on cluster- 
specific teacher quality; in Panel C, coefficient on Exit CPS;+; reports the impact of exiting an Unsatisfactory-rated 
teacher from CPS by the end of year ¢+/ (the year after a teacher received an Unsatisfactory rating) on cluster-specific 
teacher quality. We note that the sample size changes across the performance measures because not all teachers have 
each individual performance measure (e.g., only teachers in grades 3-8 reading have reading value-added measures). 
All regressions include year fixed effects, controls for the outcome variable measured at year ¢ (the year in which a 
teacher received an Unsatisfactory rating), and the following teacher-level covariates: gender, race/ethnicity, 
educational attainment, National Board Certification, birth year, and tenure status. Coefficients are statistically 
significant at the *10%, **5% and ***1% levels. 
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Table 9. Teacher Performance Measures, by Threshold for Unsatisfactory Rating 


Tenured Non-Tenured 


REACH REACH REACH REACH REACH REACH 
Score < 210 Score < 230 Score < 250 Score < 210 Score < 230 Score < 250 


Panel A: Final Ratings 


Unsatisfactory 1.00 0.56 0.26 1.00 0.36 0.16 
Developing 0.00 0.44 0.74 0.00 0.64 0.84 
Proficient 0.00 0.00 0.00 0.00 0.00 0.00 
Excellent 0.00 0.00 0.00 0.00 0.00 0.00 
184.9 200.6 222.53 192.2 210.8 227.9 
RESCH (21.22) (24.28) (26.26) (16.28) (17.65) (19.45) 


Panel B: Performance Measures 


aCe eer 1.75 1.91 2.16 1.84 2.03 2.23 
(0.29) (0.32) (0.36) (0.26) (0.27) (0.30) 

0.98 0.85 0.74 1.10 0.78 0.67 

Vee) (0.87) (0.93) (0.91) (1.11) (1.01) (0.95) 
. 0.97 0.16 0.55 0.88 0.77 0.63 

VN Resting) (1.02) (1.02) (1.02) (1.06) (0.92) (0.90) 
Teachers 239 417 900 250 688 1,482 
Teacher" Gar Obsewations 275 487 1,069 262 730 1,679 


Notes. Each cell reports mean (standard deviation), except Final Ratings categories, which report proportions, for different REACH Score thresholds determining 
teacher assignment to an Unsatisfactory annual REACH evaluation rating. 
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Figure 1. Likelihood of Teacher Exit from CPS, by Year and Tenure Status 
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Panel B. Any Exit (Year #), Non-Tenured 
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Panel D. Involuntary Exit (Year f), Non-Tenured 
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Panel F. Any Exit (Year t+/), Non-Tenured 
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Panel G. Involuntary Exit (Year ¢+/), Tenured 
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Panel H. Involuntary Exit (Year ¢+/), Non-Tenured 
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Notes. Each panel shows the exit rates for teachers with different REACH scores. Each point represents the average exit rate 
of teachers within a 7-point bin of the REACH score. 


Figure 2. Distribution of REACH Score, by Tenure Status 


Panel A. Tenured Panel B. Non-Tenured 
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Notes. We tested for discontinuities at each of the three ratings thresholds and found no statistically significant 
discontinuities. For tenured teachers, the p-value from a McCrary (2008) test is 0.836 at the Unsatisfactory/Developing 
threshold; 0.537 at the Developing/Proficient threshold; and 0.279 at the Proficient/Excellent threshold. For non- 
tenured teachers, the p-value is 0.687 at the Unsatisfactory/Developing threshold; 0.683 at the Developing/Proficient 
threshold; and 0.778 at the Proficient/Excellent threshold. 
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Figure 3. Probability of Teacher Exit from CPS at the Unsatisfactory/Developing 
Threshold, by Tenure Status 
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Panel C. Involuntary Exit (Year #), Tenured Panel D. Involuntary Exit (Year 4), Non-Tenured 
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Panel G. Involuntary Exit (Year ¢+/), Tenured Panel H. Involuntary Exit (Year t+/), Non-Tenured 
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Notes. Each panel shows the exit rates for teachers with different REACH scores within 30 points of the 
Unsatisfactory/Developing threshold of 210 REACH score points. In each panel, the solid lines are local linear fits; dots are 
within bin averages. The number of bins is allowed to differ to the right and left of the cutoff and is selected using the mimicking 
variance evenly-spaced method (Calonico et al. 2017). The left-hand-side panels limit the sample to tenured teachers; the right- 
hand-side panels limit the sample to non-tenured teachers. 
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Figure 4. Distribution of the Difference in Performance Measures between New-to-CPS 
Replacement Teachers and Unsatisfactory-Rated Teachers, by REACH Score Thresholds 
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Panel D. Difference in Reading VAM 
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Notes. A value below 0 means that the average new-to-CPS replacement teacher is lower performing on that metric 
than the Unsatisfactory-rated teacher. For the 210 threshold (n=355 teachers below the threshold), the number of cases 
where this occurs is n=1 for the REACH score, n=10 for the classroom observation score, n=10 for the math VAM, 
and n=11 for the reading VAM. For the 230 threshold (n=806 teachers below the threshold), the number of cases 
where this occurs is n=8 for the REACH score, n=42 for the classroom observation score, n=33 for the math VAM, 
and n=34 for the reading VAM. For the 250 threshold (n=1,810 teachers below the threshold), the number of cases 
where this occurs is n=66 for the REACH score, n=207 for the classroom observation score, n=77 for the math VAM, 
and n=98 for the reading VAM. These sample sizes are different from what is shown in Table 9 because (i) teacher 
performance data does not extend beyond the 2016-17 school year, so we do not observe the performance of 
replacement teachers for Unsatisfactory-rated teachers in 2016-17; and (ii) we have focused on the new-to-CPS labor 
supply, and some schools’ replacement teachers include only transfers from within CPS. 
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Appendix Tables & Figures 
Table Al. REACH System Teacher Performance Measures and Associated Weights 


Grades K-8 in non- 
Grades 3-8 intested tested subject/grade 
subject level Grades 9-12 


Individual value- 20% n/a n/a 
added measures 

based on standardized 

test scores 


Student growth on 10% 30% 30% 
district-developed 

assessments 

(“performance 

tasks”’) 


Classroom 70% 70% 70% 
observation ratings 


Notes. Each cell provides the nominal weight assigned to a teacher performance measure used to construct a teacher’s 
final REACH score upon which the final REACH evaluation ratings are based. District-developed assessments, which 
are written or hands-on assessments specifically designed for the grade and subject of the course, are administered 
and scored by teachers at the beginning and the end of the year. These assessments fulfill the state legislative 
requirement that all teachers should be evaluated in part based on student growth. Individual value-added measures 
are based on the NWEA-MAP. Classroom observation ratings are based on administrator observations of teacher 
practice using the Danielson Framework for Teaching. Weights for each of the performance measures have changed 
slightly throughout the implementation of REACH, but classroom observation ratings have always been the most 
heavily weighted component of the final REACH evaluation rating. Regarding the construction of the REACH 
evaluation score, approximately | in 5 teachers in our sample teach in a tested subject/grade level, so VAMs influence 
the ratings of relatively few teachers. Second, even for the teachers who do receive VAMs, the VAM itself only 
accounts for 20 percent of the final score. Third, no high school teachers in Chicago receive a VAM. And finally, all 
teachers are rated based on their ability to improve student learning on district-created, subject-specific assessments. 
Teachers administer and grade their own students’ assessments at the beginning and end of the year, and most teachers 
receive perfect marks on this measure. 
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Table A2. Timing of REACH System Evaluation Cycle 


Event Year t Year t+1 
REACH data x 

collected 

Formal REACH x 

rating awarded (October/November) 
Labor market xX 

response to : 

nS ae (end of period t) 

information 

Labor market xX 
response to (end of period t+1) 


formal rating 


Notes. REACH data collected includes classroom observation ratings, measures of student growth on district- 
developed assessments (“performance tasks”), and (where available) individual value-added measures based on 
standardized test scores (see Table Al). For high-rated tenured teachers — those with prior REACH ratings of 
Proficient or Distinguished, which is the vast majority of tenured teachers — the four required classroom observations 
occur over a two-year evaluation period. For low-rated tenured teachers — those with prior ratings of Unsatisfactory 
or Developing — and all non-tenured teachers, the four required observations occur in a single academic year. 
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Table A3. Estimated Discontinuities in Teacher Characteristics at the REACH Evaluation 
Rating Thresholds 


Unsatisfactory/ Developing/ Proficient/ 
Developing Proficient Excellent 
Teacher characteristic Threshold Threshold Threshold 
P(teacher = black) 0.01 0.01 -0.02 
(0.07) (0.02) (0.02) 
P(teacher = white) 0.00 -0.01 -0.02 
(0.07) (0.02) (0.02) 
P(teacher = female) -0.01 0.00 0.00 
(0.07) (0.02) (0.01) 
P(teacher holds grad degree) 0.06 -0.01 0.00 
(0.07) (0.02) (0.02) 
P(teacher = National Board) 0.00 -0.02*** 0.01 
(0.02) (0.01) (0.01) 
Birth year -1.95 0.01 -0.53* 
(1.62) (0.39) (0.32) 


Note. Each cell reports results from a separate nonparametric RD regression where the outcome is a specific teacher 
characteristic and the coefficient is the effect of being below the threshold. Regressions include only year fixed effects 
as controls and are restricted to a bandwidth of 25 points around the threshold. Robust standard errors are in 
parentheses. Coefficients are statistically significant at the *10%, **5% and ***1% levels. 


58 


Table A4. Nonparametric RD Estimates of the Impact of Proficient Evaluation Rating on Teacher Exit, by Tenure Status 


Any Exit Involuntary Exit Any Exit Involuntary Exit 
(Year t) (Year t) (Year t+1) (Year t+1) 
(1) (2) (3) (4) (5) (6) (7) (8) 

Panel A: Tenured 
Proficient (relative to .007 .004 .000 -.001 033 .030 -.012 -.012 
Developing) (.013) (.013) (.007) (.007) (.020) (.020) (.010) (.010) 
Counterfactual Mean 0.07 0.07 0.01 0.01 0.14 0.14 0.01 0.01 
Bandwidth 32.5 31.3 23.7 23.4 26.1 25.5 16.7 16.6 
N (left) 2,786 2,704 2,290 2,264 3,462 2,407 1,268 1,263 
N (right) 7,212 6,865 4,870 4,792 5,863 5,341 2,227 2,211 
Panel B: Non-tenured 
Proficient (relative to .019 .020 014 .013 .032* .034* .001 .001 
Developing) (.016) (.016) (.012) (.012) (.019) (.019) (.012) (.012) 
Counterfactual Mean 0.12 0.12 0.03 0.03 0.22 0.22 0.03 0.03 
Bandwidth 28.3 28.3 25.1 24.6 32.9 32.9 21.8 21.6 
N (left) 3,462 3,462 3,207 3,171 3,810 3,810 2,111 2,100 
N (right) 5,863 5,863 5,176 5,066 6,821 6,821 3,348 3,319 
Year FE x x x x x x x x 
Teacher Xs x xX xX xX 


Notes. Each column (within a panel) is a separate regression. Coefficients from nonparametric regression discontinuity (RD) reported with robust standard errors 
(clustered at the school level). All regressions include controls for the linear running variable — a teacher’s final REACH score (from year ¢). Teacher Xs include 
controls for teacher gender, race/ethnicity, educational attainment, National Board Certification, and birth year. Coefficients are statistically significant at the *10%, 
**5% and ***1% levels. 
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Table A5. Nonparametric RD Estimates of the Impact of Excellent Evaluation Rating on Teacher Exit, by Tenure Status 


Any Exit Involuntary Exit Any Exit Involuntary Exit 
(Year t) (Year t) (Year t+1) (Year t+1) 
(1) (2) (3) (4) (5) (6) (7) (8) 

Panel A: Tenured 
Excellent (relative to -.015 -.011 .002 .003 -.012 -.009 -.001 -.009 
Proficient) (011) (011) (.004) (.004) (.016) (.015) (.008) (.008) 
Counterfactual Mean 0.05 0.05 0.01 0.01 0.11 0.11 0.01 0.01 
Bandwidth 14.1 15.5 20.8 20.3 15.2 17.0 19.4 20.3 
N (left) 3,540 3,966 5,441 5,311 3,899 4,379 2,803 2,960 
N (right) 3,265 3,570 4,578 4,492 3,498 3,873 2,183 2,256 
Panel B: Non-tenured 
Excellent (relative to -.052** -.054** -.003 .002 -.069** -.074** 019 019 
Proficient) (.022) (.022) (.008) (.008) (.031) (.032) (.013) (.012) 
Counterfactual Mean 0.09 0.09 0.01 0.01 0.18 0.18 0.02 0.02 
Bandwidth 12.0 11.6 16.2 16.5 10.9 10.3 12.5 13.1 
N (left) 1,818 1,743 2,543 2,615 1,632 1,517 1,299 1,383 
N (right) 1,479 1,440 1,851 1,877 1,377 1,314 1,014 1,057 
Year FE x x x x x x x x 
Teacher Xs xX xX xX xX 


Notes. Each column (within a panel) is a separate regression. Coefficients from nonparametric regression discontinuity (RD) reported with robust standard errors 
(clustered at the school level). All regressions include controls for the linear running variable — a teacher’s final REACH score (from year ¢). Teacher Xs include 
controls for teacher gender, race/ethnicity, educational attainment, National Board Certification, and birth year. Coefficients are statistically significant at the *10%, 
**5% and ***1% levels. 
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Figure Al. Probability of Receiving a Final REACH Evaluation Rating Given the REACH 
Score 
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Notes. Each panel shows the share of teachers with different REACH scores who received a given rating within 30 
points of a given evaluation rating threshold. 
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Figure A2. Probability of Teacher Exit from CPS at the Developing/Proficient Threshold, 
by Tenure Status 
Panel A. Any Exit (Year ¢), Tenured Panel B. Any Exit (Year 4), Non-Tenured 
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Panel G. Involuntary Exit (Year ¢+/), Tenured Panel H. Involuntary Exit (Year ¢+/), Non-Tenured 
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Notes. Each panel shows the exit rates for teachers with different REACH scores within 30 points of the Developing/Proficient 
threshold of 285. In each panel, the solid lines are local linear fits; dots are within bin averages. The number of bins is allowed 
to differ to the right and left of the cutoff and is selected using the mimicking variance evenly spaced method (Calonico et al. 
2017). The left-hand-side panels limit the sample to tenured teachers; the right-hand-side panels limit the sample to non-tenured 
teachers. 
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Figure A3. Probability of Teacher Exit from CPS at the Proficient/Excellent Threshold, by 
Tenure Status 
Panel A. Any Exit (Year ¢), Tenured Panel B. Any Exit (Year 4), Non-Tenured 
Tenured Non-Tenured 
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Panel G. Involuntary Exit (Year ¢+/), Tenured Panel H. Involuntary Exit (Year ¢+/), Non-Tenured 
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Notes. Each panel shows the exit rates for teachers with different REACH scores within 30 points of the 
Proficient/Excellent threshold of 340. In each panel, the solid lines are local linear fits; dots are within bin averages. 
The number of bins is allowed to differ to the right and left of the cutoff and is selected using the mimicking variance 
evenly spaced method (Calonico et al. 2017). The left-hand-side panels limit the sample to tenured teachers; the right- 
hand-side panels limit the sample to non-tenured teachers. 
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Figure A4. Distribution of Cluster Size 
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Notes. The figure shows the distribution of number of teachers in a grade-by-school or subject-by-school cluster in 
year ¢+/ that contained at least one Unsatisfactory-rated teacher in year t. There are 417 clusters with at least one 
Unsatisfactory-rated teacher out of 22,864 total clusters in CPS. The average cluster contains 4.37 teachers with a 
standard deviation of 4.23. 
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